Classification of DNA sequences using Bloom filters
نویسندگان
چکیده
MOTIVATION New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. RESULTS A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. AVAILABILITY Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/ approximately palvaro/Bloom-Faster-1.6/ CONTACTS [email protected]; [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Comparing Binary Iris Biometric Templates Based on Counting Bloom Filters
In this paper a binary biometric comparator based on Counting Bloom filters is introduced. Within the proposed scheme binary biometric feature vectors are analyzed and appropriate bit sequences are mapped to Counting Bloom filters. The comparison of resulting sets of Counting Bloom filters significantly improves the biometric performance of the underlying system. The proposed approach is applie...
متن کاملBioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters
Large datasets can be screened for sequences from a specific organism, quickly and with low memory requirements, by a data structure that supports time- and memory-efficient set membership queries. Bloom filters offer such queries but require that false positives be controlled. We present BioBloom Tools, a Bloom filter-based sequence-screening tool that is faster than BWA, Bowtie 2 (popular ali...
متن کاملA Cuckoo Filter Modification Inspired by Bloom Filter
Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...
متن کاملAn Approximate Duplicate-Elimination in RFID Data Streams Based on d-Left Time Bloom Filter
Article history: Received 6 March 2010 Received in revised form 16 July 2011 Accepted 18 July 2011 Available online 31 July 2011 The RFID technology has been applied to a wide range of areas since it does not require contact in detecting RFID tags. However, due to the multiple readings in many cases in detecting an RFID tag and the deployment of multiple readers, RFID data contains many duplica...
متن کاملA Cache Architecture for Counting Bloom Filters: Theory and Application
Within packet processing systems, lengthy memory accesses greatly reduce performance. To overcome this limitation, network processors utilize many different techniques, for example, utilizing multilevel memory hierarchies, special hardware architectures, and hardware threading. In this paper, we introduce a multilevel memory architecture for counting Bloom filters. Based on the probabilities of...
متن کامل